perm filename SEARCH[E,ALS] blob
sn#139394 filedate 1975-01-14 generic text, type C, neo UTF8
COMMENT ā VALID 00003 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 Stan's scheme with modifications
C00008 00003 REG' scheme.
C00009 ENDMK
Cā;
Stan's scheme with modifications
This scheme seems to be most suited to search against strings of moderate
length altho it should work for any length. It is not particularly good on
pathological cases but on the usual string found in text or in programs it
should be a real improvement on the scheme now in ETV.
One starts by generating a table of JRSTS indexed on a dispatch value probably
in index DSP which references code. The string to match, here called STRING,
is scanned from left to right and for each letter in the string the position in
the string is written in the address field of the table, overwriting any earlier
entry so that one is left with the last occurance of each letter. Delimiters
and illegal characters are handled separately with appropiate addresses.
The text is then scanned starting with the Nth character where N is the number
of non-delimiting characters in STRING, and the command in the table at the
location using C as the index is executed. This sends control to appropiate
code to back up along the text to the character after the found character and
a step by step ILDB and CAIE C,"char" comparison will be made. If the initial
character tested was not in STRING the XCT will cause the text location to be
advanced by N positions, where the process is repeated. Thus as long as one
lands on characters that are not in STRING one steps along by N characters at
each step.
Returning to a consideration of the case when the initial text character was one
that existed in STRING, as mentioned earlier, one starts a character-by-character
compare. If this is successful all the way through STRING then a hit has been
found. The more usual case will be for the step-by-step compare to fail either
the very first time or fairly early in this process. When a failure does
occur say at letter M, then two different courses of action are reasonable.
Some slight saving in subsequent processing might be avoided if one had saved
additional location numbers for all characters that occur more than once in
STRING but it is doubtful if this saving would exceed the loss in time that
would occur every time one made a test to see if multiple occurances were
involved. The more straight forward proceedure might be to spend a little extra
time when ever a partial match fails and not advance quite as far as one could
otherwise do. The simple way to do this when a step-by step
mismatch occurs would be to then test the mismatched text character to see if
it occurrs any place in STRING at all. If it does not then one can advance the
text N positions fron the mismatch location. If on the other hand, this character
does occur then one can only advance one beyond the text location where the
initial test had been made. In either case the process is then repeated.
One should observe that the longer the STRING is then the greater is the
probability that and given text character will exist somewhere in the string
but the greater is distance that one can skip when a non-string character is
tested. Without a detailed analysis it would seem that there might be an
optimum length of string where this scheme works best. It wouls be useless for
a single character search since the overhead would be for naught and as the
probability of having any given character in the string approached unity, this
would also be true.
REG' scheme.
HRRZ B,A
MOVNS B
HRLZS B
START: MOVE T,WORD(B)
EQV T,[BYTE(7)C,C,C,C,C]
MOVE TT,T
ADD TT,[BYTE(7)1,1,1,1,1,]
JOV YES
EQV T,TT
TDNE [BYTE(7)1,1,1,1,1]
JRST YES
AOBJN B,START
YES: ;ENTER USUAL BYTE BY BYTE SEARCH